library(tidyverse)
library(plotly)
library(sf)
library(tigris)
library(leaflet)
library(censusapi)
library(mapview)

Sys.setenv(CENSUS_KEY="c8aa67e4086b4b5ce3a8717f59faa9a28f611dab")

Loading in parcel and building footprints

Parcel shapes generally found on public sites. San Mateo County: https://isd.smcgov.org/gis-data-download Already downloaded to G drive and stored as

smc_parcels <- readRDS("/Volumes/GoogleDrive/Shared drives/SFBI/Data Library/Parcels/smc_parcels.rds")

Building footprints come either from OpenStreetMap (https://www.geofabrik.de/data/download.html) or from Microsoft (https://github.com/Microsoft/USBuildingFootprints). I tend to prefer OSM, but haven’t formally done any review of which is more ground-truth accurate in the region.

OSM is stored on G drive. Microsoft is easy to download from https://drive.google.com/drive/folders/1-XGvS25tQKKQ3HTqWjAfLJ4PaeXJ9yyY (Patty Frontiera, Berkeley)

osm_bldg <- readRDS("/Volumes/GoogleDrive/Shared drives/SFBI/Data Library/OSM/osm_bldg.rds")

Filter both to just NFO and save to our repo.

nfo_boundary <- 
  places("CA", cb = T, progress_bar = F) %>% 
  filter(NAME == "North Fair Oaks")
nfo_parcels <- smc_parcels[nfo_boundary, ]
nfo_bldgs <- osm_bldg[nfo_boundary, ]

saveRDS(nfo_parcels, "nfo_parcels.rds")
saveRDS(nfo_bldgs, "nfo_bldgs.rds")

Visualize, inspect

ggplot(nfo_parcels) + geom_sf()

If you use mapview(nfo_parcels) on your Console, it gives a nice simple leaflet style view, but currently the package seems to cause errors when knitting to web page, so the ggplot() method is bland but quick.

From mapview() inspection, it looks like there are some double-counted parcels (see the darker regions). Tthis might be because condominiumized parcels are plotted as the same overall parcel boundary. We’ll de-duplicate after joining with parcel data.

ggplot(nfo_bldgs) + geom_sf()

Loading in property data and joining to parcels

We have Assessor’s data directly from the County through our relationships. https://www.smcacre.org/assessor

In G Drive, SFBI-Restricted/SMC Assessor, there are many files. secmstr is the formal “assessor roll” data that includes ownership, property value, etc. Then there are “building characteristics files” that are broken up by type: resident for single family, multifam, condo, office, hotel, industrial, mobilehome. smc_all.Rdata holds on to all the various individual datasets in an R environment. smc_assessor_full combines of all the data into one file, joined by APN (parcel number). This yields 213 fields, so it’s quite unwieldy; it’s worth spending some time exploring the “format” spreadsheets here and understanding what information is available, and focusing in on what actually should be used for the purpose of our analyses.

BTW, smc_assessor_full happens to have the parcel shapes already pre-included, so we didn’t need the previous step actually.

smc_assessor_full <- readRDS("/Volumes/GoogleDrive/Shared drives/SFBI-Restricted/SMC Assessor/smc_assessor_full.rds")

nfo_parcels <- 
  smc_assessor_full[nfo_boundary, ]

saveRDS(nfo_parcels, "nfo_parcels.rds")

write_csv(nfo_parcels, "nfo_parcels.csv")
ggplot(nfo_parcels) + geom_sf()

See what the duplicated geometries were.

duplicates <-
  nfo_parcels %>% 
  as.data.frame() %>% 
  filter(
    geometry %in% (
      nfo_parcels %>% 
         as.data.frame() %>% 
         filter(duplicated(geometry)) %>% 
         pull(geometry)
     )
  )

table(duplicates$`PUC CODE`)
## 
## 12 97 
##  2 74

According to the “PROPERTY CODE.pdf” in G drive, 12 is “store & office” and 97 is “residential condo”. We should ask Canopy generally whether any kinds of parcels are restricted (for example, is it just single family homes allowed to receive trees?). We haven’t explicitly asked this yet.

nfo_parcels <-
  nfo_parcels %>% 
  as.data.frame() %>% 
  filter(!duplicated(geometry)) %>% 
  st_as_sf()

ggplot(nfo_parcels) + geom_sf()

Note that it may be worth it to do closer inspection of the region to identify parcels that should be considered special. For example, if a parcel is a park or public property that can be directly walked on, it shouldn’t be analyzed in the same way. Note there is a Hetch Hetchy pipe easement through the neighborhood, some of which is walkable area where folks have built urban gardens; probably should be treated in a special way. Some of this will be knowable once joining with Assessor data.

And as I noted, best to remove unnecessary fields at this point to make the data more wieldy.

Loading in the sidewalk network

I’ll provide demos of actual sidewalk data soon, as it relates to getting foot traffic data, but first, here’s a convenient technique to take advantage of the original parcels being able to make the “block boundary” if you merge them together.

nfo_blocks <-
  nfo_parcels %>% 
  st_union()
  
ggplot(nfo_blocks) + geom_sf()

This can be further cleaned. But if you buffer this X feet in, then that becomes the “inside part of parcels” that aren’t optimal for sidewalk shading.

Need to make a projection system that is based in planar coordinates and ft units.

projection <- "+proj=utm +zone=10 +ellps=GRS80 +datum=NAD83 +units=ft +no_defs"

nfo_nonsidewalk_yardarea <-
  nfo_blocks %>% 
  st_transform(projection) %>% 
  st_buffer(-10) %>% 
  st_transform(4269)

Doing geospatial manipulations of geometries to narrow down to a portion of interest

Let’s go ahead and buffer out on building footprints since trees can’t be too close to buildings. Assume 4ft for now.

nfo_bldgs_buffer <-
  nfo_bldgs %>% 
  st_transform(projection) %>% 
  st_buffer(4) %>% 
  st_transform(4269)

This demonstrates how to clip.

nfo_yards <-
  nfo_parcels %>% 
  st_difference(st_union(nfo_bldgs_buffer))

ggplot(nfo_yards) + geom_sf()

And clipping the non-sidewalk portions.

nfo_yards_sidewalk_adjacent <-
  nfo_yards %>% 
  st_difference(st_union(nfo_nonsidewalk_yardarea))

leaflet() %>% 
  addProviderTiles(providers$CartoDB.Positron) %>% 
  addPolygons(
    data = nfo_yards_sidewalk_adjacent,
    color = "black",
    weight = 0.5,
    fillColor = "green",
    label = ~APN
  )

There may be other important geometry considerations like tree well size, some way to estimate driveway/paved portions, etc (possibly curb cut data exists out there. There could be info in the assessor fields that help us guess at width of driveway based on size of garage). Some parcels may have already been eliminated based on assessor data. Shading orientation relative to sun could be useful to factor in. Otherwise, the next section of analysis would start considering foot traffic and other transportation points of interest like bus stops.

Loading in Safegraph data to identify origin-destination movement behavior

Before we start manipulating the Safegraph data, we first need the census block groups (CBGs) located in North Fair Oaks. In this demo, we will only consider points of interest (POIs) that are within the NFO boundary, but this could be expanded to include POIs in neighboring communities that are within walking distance of the NFO boundary. The following chunk derives NFO block groups from all CA blockgroups.

#All block groups in CA
ca_block_groups <-
  block_groups("CA", county = "081", cb = T, progress_bar = F) %>% 
  st_set_crs(st_crs(nfo_boundary))

#Isolates NFO block groups
nfo_block_groups <-
  ca_block_groups %>% 
  st_centroid() %>% 
  .[nfo_boundary,] %>% 
  dplyr::select(GEOID,geometry) %>% 
  st_set_geometry(NULL) %>% 
  left_join(
    ca_block_groups %>% 
      dplyr::select(GEOID,geometry),
    by = "GEOID"
  ) %>%
  st_as_sf() %>% 
  st_set_crs(st_crs(nfo_boundary))
  
ggplot(nfo_block_groups) + geom_sf()

Now we will jump into Safegraph data. For important background information on Safegraph weekly patterns data, please read the bullet titled “Safegraph Weekly Patterns” in the Background section of the NFO Project Doc.

First, we will set some file path variables that we will use later in the analysis.

# SET your path here to the covid19analysis folder.
sg_path <- 'G:/Shared drives/SFBI-Restricted/Safegraph/covid19analysis/'

#This is the location of the Core POI data. Safegraph releases new core_poi files each month.
poi_url <- paste0(sg_path,'core/2020/09/Core-USA-Sep-CORE_POI-2020_08-2020-09-08/core_poi-ca.rds')

Now set file paths for the weekly patterns files. I am only reading in data in the time window of August 3 through August 30, 2020, but we have access to weekly patterns data all the way back to the beginning of 2019. Note that the naming of the file indicates which day the data starts on, and includes data for all the days in that week. Also note that we are reading a “home-panel-summary” file, which is explained in the background section of the project doc.

#August 03 2020
patterns_W200803 <- 
  paste0(sg_path,'weekly-patterns/v2/main-file/2020-08-03-weekly-patterns-ca.rds')

hps_W200803 <- 
  paste0(sg_path,'weekly-patterns/v2/home-summary-file/2020-08-03-home-panel-summary.rds')

#August 10 2020
patterns_W200810 <- 
  paste0(sg_path,'weekly-patterns/v2/main-file/2020-08-10-weekly-patterns-ca.rds')

hps_W200810 <- 
  paste0(sg_path,'weekly-patterns/v2/home-summary-file/2020-08-10-home-panel-summary.rds')


#August 17 2020
patterns_W200817 <- 
  paste0(sg_path,'weekly-patterns/v2/main-file/2020-08-17-weekly-patterns-ca.rds')

hps_W200817 <- 
  paste0(sg_path,'weekly-patterns/v2/home-summary-file/2020-08-17-home-panel-summary.rds')

#August 24 2020
patterns_W200824 <- 
  paste0(sg_path,'weekly-patterns/v2/main-file/2020-08-24-weekly-patterns-ca.rds')

hps_W200824 <- 
  paste0(sg_path,'weekly-patterns/v2/home-summary-file/2020-08-24-home-panel-summary.rds')

We will now read in the file that includes the normalization function. This file was pre-created to normalize the visits in the safegraph data to account for each block group’s population. This file is sitting in a different repository in Github. Make sure you clone the covid19 repo before running the next chunk. This .R file is in a separate working directory, so adjust the file path below as necessary to be able to point to the correct location on your machine. Note that you might not be able to run this source() command from the Source window (something odd about it being within a chunk), but you can run it by copying it to the Console.

setwd("~/GitHub/covid19/safegraph_processing")

source('safegraph_process_patterns_functions.R')

setwd("~/GitHub/trees")

The following chunk reads and processes the weekly patterns files, and extrapolates the origin-destination (OD) visit pairs. See the in-line comments for more information.

Note that we likely want to have a wider catchment area than just the POIs literally in NFO, something that would represent the set of POIs that NFO residents could realistically walk to.

poi_ca <- readRDS(poi_url) #read core_poi file

#Function that reads in each patterns file, and filters/cleans the dataset
process_patterns <- function(patterns){
  
#Load the SafeGraph patterns dataset.
sg_date <- readRDS(patterns)

#Subset to POIs in North Fair Oaks
sg_nfo_businesses <- 
  sg_date %>% 
  filter(city == "North Fair Oaks") %>% 
  left_join(
    poi_ca %>% 
      select(
        safegraph_place_id,
        latitude,
        longitude,
        top_category,
        sub_category
      ),
    by = "safegraph_place_id"
  ) 

return(sg_nfo_businesses)
}

#Processes patterns for August 2020
patterns_0803 <- process_patterns(patterns_W200803)
patterns_0810 <- process_patterns(patterns_W200810)
patterns_0817 <- process_patterns(patterns_W200817)
patterns_0824 <- process_patterns(patterns_W200824)

#Read home panel summary file
hps_0803 <- readRDS(hps_W200803)
hps_0810 <- readRDS(hps_W200810)
hps_0817 <- readRDS(hps_W200817)
hps_0824 <- readRDS(hps_W200824)

#By using the normBG function from the safegraph_process_patterns_functions.R, we can get the origins of visitors (at the CBG level). 
#There is a function within this file that allows you to obtain daily origin visit information, but for the purposes of this analysis
#We will use weekly visit from origins, and later take the average across four weeks.
origins_0803 <- normBG(patterns_0803,hps_0803)
origins_0810 <- normBG(patterns_0810,hps_0810)
origins_0817 <- normBG(patterns_0817,hps_0817)
origins_0824 <- normBG(patterns_0824,hps_0824)

#Combine all of the origin/destination data for August
cumul_origins <-
  origins_0803 %>% 
  rbind(origins_0810) %>% 
  rbind(origins_0817) %>% 
  rbind(origins_0824) 

#At this point, some origin_census_block_group fields are NA. As explained in the background section of the project doc, this is due to there only being 1 visitor from a certain CBG that is not recorded.
#Here, we filter to only CBGs in NFO
#Next, we have a lower and upper estimation of visit counts. To get one visit value, we simply take the average of these two numbers.
#We also omit the rows that have NA as a origin census block group
#Finally, we group by origin/destination, and take the average 
cumul_origins_grouped <-
  cumul_origins %>% 
  filter(origin_census_block_group %in% nfo_block_groups$GEOID) %>% 
  mutate(
    origin_visitor_counts = (origin_visitor_counts_high + origin_visitor_counts_low) / 2
  ) %>% 
  select(
    safegraph_place_id,
    origin_census_block_group,
    location_name,
    origin_visitor_counts
  ) %>% 
  na.omit() %>% 
  group_by(
    safegraph_place_id,
    origin_census_block_group,
    location_name
  ) %>% 
  summarise(
    mean_origin_visitor_counts = mean(origin_visitor_counts)
  )

At this point, we have average weekly origin/destination visit values for the past four weeks for CBGs in NFO. Now, we are going to take this a step further and derive origin/destination visit values at the parcel level.

#Pulls the first block group from the dataframe we just created to use as an example.
example_cbg <- cumul_origins_grouped$origin_census_block_group[1]

#Filters the safegraph origin visit dataframe to only include the example CBG
origins_example_filtered <-
  cumul_origins_grouped %>% 
  filter(origin_census_block_group %in% example_cbg)

#Pulls the geometry of the example CBG from the nfo_block_groups dataframe
example_bg_geom <-
  nfo_block_groups %>% 
  filter(GEOID %in% example_cbg)

#Uses geospatial techniques to isolate parcels within the example CBG
example_parcels <-
  nfo_parcels %>% 
  st_centroid() %>% 
  .[example_bg_geom,] %>% 
  st_set_geometry(NULL) %>% 
  select(APN) %>% 
  mutate(origin_census_block_group = example_cbg) %>% 
  left_join(
    nfo_parcels %>% 
      select(APN)
  ) %>% 
  st_as_sf()

leaflet() %>% 
  addProviderTiles(providers$CartoDB.Positron) %>% 
  addPolygons(
    data = example_bg_geom,
    stroke = F
  ) %>% 
  addPolygons(
    data = example_parcels,
    color = "black",
    weight = 1
  )

Note that you would want to filter out the non-residential parcels at this stage.

Finally, we create parcel level OD pairs. Assume Safegraph trip data gets equally divided across all available residential parcels in CBG.

example_parcels_patterns <-
  example_parcels %>% 
  left_join(
    origins_example_filtered,
    by = "origin_census_block_group"
  ) %>% 
  mutate(
    parcel_avg_weekly_visits = mean_origin_visitor_counts / nrow(example_parcels)
  )

To do the OSRM routing, we’ll need to attach lat/long coordinates for both the parcel origin and the POI destination to this dataframe.

Loading in Census/other data to estimate walking behavior

Using OSRM to do walking routing

Consider transit data